Goto

Collaborating Authors

 game-theoretic interaction


Fine-Grained Semantically Aligned Vision-Language Pre-Training

Neural Information Processing Systems

Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks. Existing methods mainly model the cross-modal alignment by the similarity of the global representations of images and text, or advanced cross-modal attention upon image and text features. However, they fail to explicitly learn the fine-grained semantic alignment between visual regions and textual phrases, as only global image-text alignment information is available. In this paper, we introduce LOUPE, a fine-grained semantically aLigned visiOn-langUage PrE-training framework, which learns fine-grained semantic alignment from the novel perspective of game-theoretic interactions. To efficiently estimate the game-theoretic interactions, we further propose an uncertainty-aware neural Shapley interaction learning module. Experiments show that LOUPE achieves state-of-the-art performance on a variety of vision-language tasks. Without any object-level human annotations and fine-tuning, LOUPE achieves competitive performance on object detection and visual grounding. More importantly, LOUPE opens a new promising direction of learning fine-grained semantics from large-scale raw image-text pairs.


GISExplainer: On Explainability of Graph Neural Networks via Game-theoretic Interaction Subgraphs

Xian, Xingping, Liu, Jianlu, Wang, Chao, Wu, Tao, Qiao, Shaojie, Tang, Xiaochuan, Liu, Qun

arXiv.org Artificial Intelligence

Explainability is crucial for the application of black-box Graph Neural Networks (GNNs) in critical fields such as healthcare, finance, cybersecurity, and more. Various feature attribution methods, especially the perturbation-based methods, have been proposed to indicate how much each node/edge contributes to the model predictions. However, these methods fail to generate connected explanatory subgraphs that consider the causal interaction between edges within different coalition scales, which will result in unfaithful explanations. In our study, we propose GISExplainer, a novel game-theoretic interaction based explanation method that uncovers what the underlying GNNs have learned for node classification by discovering human-interpretable causal explanatory subgraphs. First, GISExplainer defines a causal attribution mechanism that considers the game-theoretic interaction of multi-granularity coalitions in candidate explanatory subgraph to quantify the causal effect of an edge on the prediction. Second, GISExplainer assumes that the coalitions with negative effects on the predictions are also significant for model interpretation, and the contribution of the computation graph stems from the combined influence of both positive and negative interactions within the coalitions. Then, GISExplainer regards the explanation task as a sequential decision process, in which a salient edges is successively selected and connected to the previously selected subgraph based on its causal effect to form an explanatory subgraph, ultimately striving for better explanations. Additionally, an efficiency optimization scheme is proposed for the causal attribution mechanism through coalition sampling. Extensive experiments demonstrate that GISExplainer achieves better performance than state-of-the-art approaches w.r.t. two quantitative metrics: Fidelity and Sparsity.


Fine-Grained Semantically Aligned Vision-Language Pre-Training

Neural Information Processing Systems

Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks. Existing methods mainly model the cross-modal alignment by the similarity of the global representations of images and text, or advanced cross-modal attention upon image and text features. However, they fail to explicitly learn the fine-grained semantic alignment between visual regions and textual phrases, as only global image-text alignment information is available. In this paper, we introduce LOUPE, a fine-grained semantically aLigned visiOn-langUage PrE-training framework, which learns fine-grained semantic alignment from the novel perspective of game-theoretic interactions. To efficiently estimate the game-theoretic interactions, we further propose an uncertainty-aware neural Shapley interaction learning module.


Can the Inference Logic of Large Language Models be Disentangled into Symbolic Concepts?

Shen, Wen, Cheng, Lei, Yang, Yuxiao, Li, Mingjie, Zhang, Quanshi

arXiv.org Artificial Intelligence

In this paper, we explain the inference logic of large language models (LLMs) as a set of symbolic concepts. Many recent studies [4, 9, 10] have discovered that traditional DNNs usually encode sparse symbolic concepts. However, because an LLM has much more parameters than traditional DNNs, whether the LLM also encodes sparse symbolic concepts is still an open problem. Therefore, in this paper, we propose to disentangle the inference score of LLMs for dialogue tasks into a small number of symbolic concepts. We verify that we can use those sparse concepts to well estimate all inference scores of the LLM on all arbitrarily masking states of the input sentence. We also evaluate the transferability of concepts encoded by an LLM and verify that symbolic concepts usually exhibit high transferability across similar input sentences. More crucially, those symbolic concepts can be used to explain the exact reasons accountable for the LLM's prediction errors.